How to Conduct Rigorous 
Evaluations of Mathematics and 
Science Partnerships (MSP) 
Projects 

A User-Friendly Guide for MSP Project Officials 
and Evaluators 



Coalition for Evidence-Based Policy ”1 to ’ m L" 

A Project Sponsored by „ 



NORC 

A national organization for research 



at the University of Chicago 




August 2005 



This publication was produced by the Coalition for 
Evidence- Based Policy, in partnership with the National 
Opinion Research Center (NORC) at the University of 
Chicago, under a contract with the Institute of Education 
Sciences. The Coalition is sponsored by the Council for 
Excellence in Government ( www.excelgov.org/ evidence) . 
The views expressed herein are those of the contractor. 

This publication was funded by the U. S. Department of 
Education, Institute of Education Sciences (Contract #ED- 
01-CO-0028/ 0001). 

This publication is in the public domain. Authorization to 
reproduce it in whole or in part for educational purposes is 
granted. 



PURPOSE AND OVERVIEW OF THIS GUIDE 



Purpose : To provide MSP project officials and evaluators with clear, practical advice on 
how to conduct rigorous evaluations of MSP projects at low cost. 

Specifically, this is a how-to Guide designed to enable MSP grantees and evaluators of MSP 
projects to answer questions about the projects' impact, such as: "Does this MSP project 
improve student math and science achievement and increase teacher content knowledge; if so, 
by how much?" The resulting knowledge about "what works" can then be used by schools and 
districts as an effective, valid tool in ensuring: 

(i) that their math and science teachers are highly qualified, and 

(ii) that their students are proficient in math and science, 

both of which are central goals of American education policy. 

The advice in this Guide is targeted primarily to two audiences: 

(i) MSP grantees (or grant applicants) that are working with an evaluator to conduct a 
rigorous single-site evaluation (i. e. , evaluation of the grantee's single MSP project); 
and 

(ii) Evaluators that have been selected by a state to conduct a rigorous cross-site 
evaluation (i.e., evaluation of multiple MSP projects that are implementing a specific, 
well-defined MSP model, such as the Chicago Mhth and Science Initiative). 

We have also prepared a companion Guide for MSP state coordinators - How to Solicit Rigorous 
Evaluations of MSP Projects - at http: / / www. ed. gov/ programs/ mathsci/ issuebrief. doc . 

Our advice is designed to enable M3P projects and evaluators to conduct rigorous impact 
evaluations that (i) are low in cost (e.g., $50,000 - $75,000 for a single-site evaluation), and 
(ii) produce valid, actionable knowledge about what works within 1-2 years. 

Overview : This Guide provides concrete, step-by-step advice, as follows: 

Step 1 : Find a researcher with expertise in conducting rigorous impact evaluations to 

include on the study team. 

Step 2 : Decide what research question(s) the study seeks to answer. 

Step 3 : Decide on the study design. 

Step 4 : Gain the cooperation of teachers and school officials. 

Step 5 : Allocate teachers to the program and control (or matched comparison) groups. 

Step 6 : Collect the data needed to measure the MSP project’s effectiveness. 

Step 7 : Analyze and report the study’s results. 



3 




■ STEP 1 : FIND A RESEARCHER WITH EXPERTISE IN 

CONDUCTING RIGOROUS EVALUATIONS TO INCLUDE ON THE 
STUDY TEAM 



This may be the most important step in the process. This Guide provides general principles in 
conducting these evaluations; however, a researcher with hands-on experience and demonstrated 
success in conducting such studies is needed to address specific operational guestions as they arise 
over the course of the study. Such a researcher might participate as the principal investigator or other 
study team member, or as a key consultant to the team. 

To identify such a researcher , we suggest that: 

A. You contact the authors of previous well-designed impact evaluations to explore their 
interest in participating in the study and/ or their recommendations of colleagues to do so. To 
find such authors, you can go to websites that summarize findings from well-designed 
randomized controlled trials (the strongest study design) and well- matched comparison-group 
studies, such as the sites listed in Appendix A. These sites all identify the study authors - 
including many capable mid-level researchers who might participate on your study team at 
modest cost. In some cases, the sites also list the authors' contact information. 

Another useful place to look for such researchers is the What Works Clearinghouse's Registry of 
Outcome Evaluators, at http: / / whatworks. ed. gov/ technicalassistance/ EvlSearch. asp . Keep in 
mind that the evaluators on this list are self-nominated; therefore, a careful review is needed to 
identify those who may have previously conducted a well-designed impact evaluation. 

B. You ask the prospective researchers that you so identify for a plan that describes, in clear, 
nontechnical language, how they would carry out Steps 2-7 in this Guide. 

We suggest you review the plan to determine whether it: 

■ Asks clear research guestions about the MSP project's effect on student achievement 
and, if appropriate, teacher content knowledge (Step 2); 

■ Includes a strong study design - preferably a randomized controlled trial or, if not 
possible, a well-matched comparison- group study (Step 3A); 

■ Includes a sound, workable approach to recruiting the minimum sample of teachers 
(Step 3B), gaining the cooperation of teachers and school officials (Step 4), and 
allocating teachers to program and control (or matched comparison) groups (Step 5); 

■ Includes outcome measures (e.g., student achievement scores) that will enable you to 
answer the study's research guestions (Step 3D), and a sound, workable approach to 
collecting outcome and other data (Step 6); and 

■ Includes a sound, workable approach to analyzing the study results, so as to obtain 
valid estimates of the MSP project's effects on key outcomes (Step 7). 



4 




C. You ask the prospective researchers for references, such as other researchers or school 
officials who participated in the researcher's previous studies. 

We suggest you ask the references for evidence that the researcher: (i) has played a central 
role in an earlier rigorous evaluation; (ii) successfully handled that role; and (if the researcher 
is being considered for principal investigator) (iii) has the organizational and interpersonal 
skills to manage the operation of a study, staying within budget and schedule. 



STEP 2 : DECIDE WHAT RESEARCH QUESTION(S) THE STUDY 
SEEKS TO ANSWER. 



We suggest you choose just a few well-defined research questions - ones which (i) will yield 
valuable, actionable knowledge about "what works" in the MSP program, and (ii) can preferably be 
answered at low or modest cost. IVbre specifically: 

A. We suggest that the study seek to evaluate a specific, well-defined MSP approach. 

For example, suppose that your MSP project(s) provides one training program to elementary 
school teachers, and a different training program - with a distinct curriculum - to middle 
school teachers. In this case, we suggest that your study evaluate one of the training programs 
or the other, but not both (unless resources permit you to conduct a study of each, with the 
appropriate sample size of teachers in each study). This is because if you evaluate the two 
together, you may be able to determine whether the two together are effective, but you will 
probably not be able to determine, with statistical confidence, whether it was one program or 
the other or both that produced the study's overall finding. You may also not be able to find 
common outcome measures for the two different programs - a problem which may well prevent 
the study from producing meaningful results. 

B. We suggest that one of the research questions address the effect of the MSP project(s) on 
student math and/or science achievement, since (i) a key goal of the MSP program is to 
increase student achievement by enhancing the knowledge and skills of their teachers, and (ii) 
this effect can often be measured at very low cost, using test scores that schools already 
collect for other purposes. 

Specifically, you can often measure the effect on student math achievement at low cost 
because many states now test mathematics achievement annually, especially in the early 
grades. Testing of students' science achievement, however, is less common. Thus one 
possible low-cost approach would be to choose a research guestion about the project(s) effect 
on math, but not science, achievement (and address science achievement if additional 
resources become available for that purpose). For example: 

"Does the MSP project between school district X and college Y increase student math 
achievement in grades 2-5, compared to a control group? If so, by how much?" 



5 




The overall cost of a single- site RCT addressing this central policy question may be as low as 
$50, 000- $75, 000, if the study measures achievement using test scores that schools already 
collect. 

C. An additional research question might be the effect of the MSP project(s) on teacher 
content knowledge, since improving such knowledge is a key intermediate goal of the MSP 
program. For example: 

"Does the MSP project increase the math and/ or science content knowledge of middle- 
school science teachers, compared to a control group? If so, by how much?' 

However, measuring the effect on teacher content knowledge poses two additional challenges: 

■ It requires a larger sample of teachers to identify an effective project, for reasons 
discussed in the companion Guide for MSP state coordinators on soliciting rigorous MSP 
evaluations ( http: / / www. ed. gov/ programs/ mathsci/ issuebrief . doc, page 9). 
Suggestions for addressing the sample size issue are discussed in Step 3B, below. 

■ It requires that a survey or other instrument be administered to teachers to assess 
their content knowledge, which raises the study's cost. 

D. If resources permit, you may also wish to choose research questions about the longer- 
term effect of the MSP project(s). 

These might include, for example, questions about whether the effect of teachers' MSP 
training on their students' achievement increases or decreases in successive school years - 
e.g., because teachers need time to fully incorporate their new knowledge into classroom 
instruction, or alternatively because over time they forget what they learned. Thus, you might 
include research questions about the effect of teachers' MSP training in the summer of 2006 on 
the achievement of their 2006-2007 students, and on the achievement of their 2007-2008 
students. 

You might also ask research questions about the effect on students' educational performance 
over a 2-3 year period (e.g., math/ science test scores, enrollment in higher-level 
math/ science courses, grade retentions and special education placements). 

E. If resources permit, you might also choose one or two research questions relating to MSP 
project implementation. For example: 

"How many hours of training did teachers in the MSP project receive on average, and what 
percentage of teachers completed all MSP training sessions?' 

"To what extent did MSP trainers cover the key items in the MSP curriculum?' 

"What non- MSP professional development programs do teachers in the control group 
participate in, what training curriculum is used, how many hours of such training do they 
receive on average?' 



6 



Such research questions can help you interpret the study findings, identifying (i) possible 
reasons why the project is effective or ineffective, and (ii) if effective, likely ingredients 
needed to replicate it successfully in other sites. 



STEP 3: DECIDE ON THE STUDY DESIGN. 



This includes the following sub-steps: (A) Deciding whether to use a randomized controlled trial or 
well-matched comparison-group study design; (B) Deciding on the sample size (i.e., number of teachers 
to include in the study); (C) Deciding how teachers will be recruited into the study, and allocated 
between program and control (or matched comparison) groups; and (D) Deciding how to measure 
proj ect outcomes. 

A. Deciding on overall design : We strongly suggest a randomized controlled trial (RCT), and 
would recommend a well-matched comparison-group design only if an RCT is truly 
infeasible. 

If at all possible, we suggest that you conduct an RCT, in which teachers (plus their classes) are 
randomly assigned to the MSP project or to a control group. As discussed in the companion 
Guide for MSP state coordinators ( http:/ / www. ed. gov/ programs/ mathsci/ issuebrief. doc, pages 
13-16), well-designed RCTs are considered the gold standard for measuring a program's 
impact, based on persuasive evidence that (i) they are superior to other evaluation methods in 
estimating a program's true effect; (ii) the most commonly-used nonrandomized methods often 
produce erroneous conclusions and can lead to practices that are ineffective or harmful. 

Some schools and/ or teachers may have concerns about randomly assigning some teachers to a 
control group that will not participate in the MSP project. In most cases, a persistent research 
team can assuage their concerns and gain their cooperation in the RCT using approaches such 
as those discussed in Step 4 of this Guide. 

We suggest that you consider a well- matched comparison- group study only if you have 
exhausted all possible options for conducting an RCT and conclude that it is truly not feasible. 

A well-matched comparison-group study compares outcomes for teachers (plus their classes) 
that participate in the MSP project against outcomes for a comparison group of teachers that 
(i) are chosen through methods other than randomization, and (ii) are closely matched with the 
MSP teachers in their students' initial achievement levels and other characteristics (as 
discussed further in Step 3C). 

Among nonrandomized studies, comparison-groups studies with such careful matching are the 
most likely to generate valid estimates of an intervention's true effect, but we suggest them 
only with reservation because they may still produce erroneous conclusions (as discussed in the 
companion Guide, http:/ / www. ed. gov/ programs/ mathsci/ issuebrief. doc, page 15). 

B. Deciding on sample size : 

1 . To measure the effect on student achievement, a minimum sample of about 60 
teachers, plus their classes, is needed. 



7 




As discussed in the companion Guide for MSP state coordinators 
( http:/ / www. ed. gov/ programs/ mathsci/ issuebrief. doc, page 5), a minimum sample of 
about 60 teachers is needed for an MSP evaluation (single-site or cross- site) to produce 
strong evidence about an MSP project's effect on student math or science achievement. 

Of the 60, 30 teachers plus their classes would be randomly assigned to the MSP program 
group, and 30 would be randomly assigned to the control group. (Similarly, in a matched 
comparison-group study, 30 would be included in the program group and 30 in the 
comparison group). 1 

hfany individual MSP projects have enough math and/ or science teachers to meet these 
sample size reguirements, in which case a rigorous single-site evaluation is feasible. 
However, some smaller MSP proj ects may not, by themselves, have enough teachers. Such 
projects may therefore wish to partner with other MSP projects to carry out a cross-site 
evaluation with at least 60 teachers. Proj ects partnering in this way would need to 
implement the same MSP model (e.g., the same summer institute program providing the 
same teacher training), for reasons discussed above (Step 2A). 

2. To measure the effect on teacher content knowledge, a larger sample of teachers is 
needed (e.g., 90). 

The reasons for the larger sample reguirement are discussed in the companion Guide for 
M3P state coordinators ( http: / / www. ed. gov/ programs/ mathsci/ issuebrief. doc, page 9). If 
your MSP proj ect does not have enough teachers to meet this reguirement, yet you still 
wish to measure its effect on teacher content knowledge, we suggest two possible courses 
of action. 

First, you could partner with other MSP projects to carry out a cross-site evaluation with at 
least 90 teachers. Alternatively, you could go ahead and measure the effect on teacher 
content knowledge anyway, using a smaller sample (e.g., 60 teachers). It is important to 
recognize, however, that such an approach (i) is likely to show statistically-significant 
effects on teacher content knowledge only for MSP projects that are highly effective, and 
(ii) in other cases, may show effects that do not reach statistical significance and therefore 
constitute suggestive evidence, rather than strong evidence. Such suggestive evidence is 
useful in generating hypotheses to test in larger evaluations in the future. 

C. Deciding how to recruit and allocate teachers : 

1 . The evaluator should preferably recruit teachers into the study using the same 
process the MSP project would use in the absence of a study. 

This will help ensure that the evaluation is measuring the effectiveness of the MSP project 
as it is normally implemented. So, for example, if the MSP proj ect would normally recruit 
teachers by publicizing its training program and asking teachers to volunteer, the study 
should preferably do the same. Alternatively, if the MSP project would normally ask school 
principals to designate teachers to participate, the study should recruit in the same way. 

In a cross- site evaluation (i.e., evaluation of several MSP projects all implementing the 
same MSP model), we suggest you recruit teachers from each MSP project roughly in 
proportion to the size of the project. 



8 



2. The simplest wav to allocate teachers in an RCT : Apply random assignment to the 
entire sample, ensuring equal numbers in the program and control groups. 

For example, if you have recruited 60 teachers from several schools to participate in the 
study, the simplest approach would be to randomly assign 30 teachers to the MSP project 
and 30 to a control group, using computer software that ensures an equal number in each 
group. The random assignment will ensure, to a high degree of confidence, that there are 
no systematic differences between the two groups of teachers and their classes in initial 
student achievement levels and other characteristics. 

If, however, the teachers' classes have widely varying initial levels of math and/ or science 
achievement (e.g., for reasons discussed in endnote 1), it is possible that applying the 
above random assignment process to a sample of just 60 teachers may not produce two 
equivalent groups. In such circumstances, you may wish to consider "blocked" random 
assignment. Under this approach, you would group teachers into several "blocks" (e.g., 
those teaching high- achieving students, those teaching average- achieving students, and 
those teaching low-achieving students), and then randomly assign teachers within each 
block to program and control groups. As an illustrative example, you might group the 
whole sample of 60 teachers into three blocks of 20, and then randomly assign 10 teachers 
in each block to the program group and 10 to the control group. 

In cases where teachers' classes vary widely in initial achievement, blocking may increase 
the study's ability to produce precise estimates of an MSP project's effects. (In other 
cases, it may actually do the reverse.) Therefore, you may wish to ask the member of your 
research team with expertise in RCTs whether blocking makes sense for your study. 2 

3. In a comparison group study, select comparison-group teachers who are very closely 
matched with program group teachers, particularly in their classes' initial achievement 
level. 

We offer here a few general principles on matching, and suggest that you consult a 
researcher with expertise in such studies for further advice: 

■ Of primary importance, the comparison- group teachers should be very closely matched 
with program group teachers in their classes' initial math and/ or science achievement 
level (whichever will be the main outcome of interest). 

■ Of secondary importance, the teachers in the program and comparison group should 
also be matched in the demographics and grade levels of their students, and 
geographical proximity (e.g., same school district). 

Statistical techniques such as propensity score matching may be used to accomplish such 
matching, but a full discussion of such techniques is beyond the scope of this Guide. 3 

Careful studies have shown that when a comparison- group study does not include close 
matching in the above characteristics, the study is unlikely to generate accurate results 
even when statistical methods (such as regression adjustment) are used to correct for 
these differences in estimating the program's effect. 



9 



